[wip] Add trace-aware stage summaries + plotting helper for profiling #1168

jioffe502 · 2025-11-19T22:35:03Z

Description

At the Ray stage we now log every major hop a document takes: queue time before the worker picks it up, the full pdf_extractor resident time, and downstream stages (YOLOX ensembles for tables/charts, OCR/text extraction, metadata construction, embedding, storage, etc.). Those show up in results.wall_time.png and the stage-time bar chart.

Inside the PDF extractor we added sub-spans for the previously opaque rasterization leg—rendering the page via PDFium, copying the bitmap into NumPy, scaling to YOLOX size, and padding. Those spans feed the new PDFium breakdown chart and CSV so we can compare per-document/per-page cost. You can now say “document 2062555.pdf spent ~0.65 s/page in scaling, which is 60% of its PDF extractor time” instead of just “this doc was slow.”

The combination of stage-level metrics (queue/wall/resident) plus the PDFium micro-spans gives a holistic view: you can see how much time each document spends waiting in Ray, how much is consumed by the PDF extractor as a whole, and exactly which sub-step dominates inside that extractor.

Task List

Plumbed enable_traces/trace_output_dir through the test config and e2e case so trace payloads are captured automatically during scripted runs.
Added trace_summary generation in scripts/tests/cases/e2e.py, writing per-stage aggregates plus per-document totals; run.py now records trace flags in results.json.
Documented how to enable tracing, run baseline vs RC comparisons, and consume the new artifacts (README updates + profiling workflow notes).
Introduced scripts/tests/tools/plot_stage_totals.py, a helper that reads any results.json and emits a PNG + textual summary showing cumulative resident seconds per stage (with options to sort, collapse nested entries, filter network noise, etc.).
Document-level wall time: _summarize_traces now records each doc’s elapsed span (first stage entry → last exit) so the trace artifacts report realistic wall clocks in addition to resident totals.
Dual visualization flow: plot_stage_totals.py grew an optional wall-time chart (results.wall_time.png) that contrasts per-doc wall vs resident seconds, highlights effective parallelism (resident/wall ratio), and prints summaries alongside the existing stage-resident PNG.

Testing:

Generated bo20 and bo767 runs with ENABLE_TRACES=true; verified results.json contains the new trace_summary, trace files land under artifacts/.../traces/, and the plotting tool produces the expected charts (*.stage_time.png) using both collapsed and nested views.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

- Track submission_ts_ns throughout V2 ingest pipeline - Extract ray_wait_s, in_ray_queue_s, ray_start_ts_s, ray_end_ts_s metrics - Enhance wall-time visualization with wait and queue time bars - Add wait/queue time summaries and percentile statistics - Update documentation with new profiling metrics

Add trace-aware stage summaries + plotting helper for profiling

99ad3e2

jioffe502 requested a review from a team as a code owner November 19, 2025 22:35

jioffe502 requested a review from jperez999 November 19, 2025 22:35

jioffe502 added 4 commits November 20, 2025 22:16

posthoc util for plotting

19315a1

adding wall time plot + better tracing

59318d9

Merge branch 'main' into ray_profiling

e5d7430

jioffe502 marked this pull request as draft November 25, 2025 16:08

jioffe502 added 2 commits November 25, 2025 21:07

reorganizing code and simplifying

57c0dcd

simplifying plot tool

429513f

jioffe502 changed the title ~~Add trace-aware stage summaries + plotting helper for profiling~~ [wip] Add trace-aware stage summaries + plotting helper for profiling Dec 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wip] Add trace-aware stage summaries + plotting helper for profiling #1168

[wip] Add trace-aware stage summaries + plotting helper for profiling #1168

Uh oh!

jioffe502 commented Nov 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[wip] Add trace-aware stage summaries + plotting helper for profiling #1168

Are you sure you want to change the base?

[wip] Add trace-aware stage summaries + plotting helper for profiling #1168

Uh oh!

Conversation

jioffe502 commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Task List

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jioffe502 commented Nov 19, 2025 •

edited

Loading